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At  the  onset  of  this  grant  period,  the  PI  proposed  to  conduct  a  study  of  hardware  aware 
low  power  radio  architectures  and  protocols.  It  had  become  increasingly  clear  that  while 
the  communications  community  had  notable  ideas  for  constructing  networks  of  simple  low 
power  radios  that  could  be  used  for  ubiquitous  sensing  tasks,  many  of  these  ideas  actually 
required  the  construction  of  high  power  radios.  Moreover,  the  hardware  and  circuit 
design  community  was  far  more  interested  in  high  performance  of  communication 
architectures  than  in  trying  to  develop  their  own  low  power  protocols  that  made  sense  in 
hardware.  As  a  result,  the  PI  took  on  the  task  of  merging  ideas  from  communications  with 
low  power  hardware  design  as  part  of  an  exploration  of  co-optimized  architectures  for  real 
self-powered  radio  systems.  The  results  have  been  a  comprehensive  study  of  CW 
(continuous  wave)  and  UWB  (ultra-wide  band)  approaches  which  has  led  to  the 
development  of  a  truly  novel  UWB  architecture  which  solves  many  of  these  problems  and 

may  be  used  in  ubiquitous  sensor  networks. 

Scientific  contributions  for  the  period  of  this  grant  are  divided  into  3  topics.  First  we 
address  contributions  in  the  area  of  low  power  UWB  transceiver  design.  The  second  topic 
addressed  is  design  of  PLL  architectures  for  low  power  CW  transceiver.  The  third  topic  of 
interest  is  in  the  design  of  new  oscillator  topologies  for  low  power  CW  transceivers. 

Previous  technical  reports  have  addressed  these  topics,  and  some  of  that  material  is 
contained  in  this  final  report.  However,  since  the  last  report  for  2007,  there  have  been 
significant  advances  on  the  topic  of  UWB  design,  requiring  extension  of  this  topic  as  a  focus 
of  this  final  report. 

Topic  1:  PCO  based  UWB  radios  for  ultralow  power  operation. 

Background  of  UWB  radio: 

Communication  networks  that  enable  data  to  be  passed  wirelessly  with  little  to  no  cost  in  power 
have  high  potential  impaet  for  many  applications  [1], 

Such  a  network  ean  make  communication,  sensing, 
and  monitoring  unobtrusive,  and  essentially  free,  sinee 
all  of  the  power  required  for  such  operations  ean  be 
harvested  from  the  environment  or  taken  from  an  ultra¬ 
light  energy  souree,  redueing  the  size  and  weight  of 
network  nodes.  A  number  of  ideas  and  insights  have 
been  investigated  at  the  hardware  and  network  levels 
to  dramatically  reduce  the  power  required  for 
communications;  however  this  work  has  not  yet  Figure  l  UWB  transmission  and  receiving  scheme, 

produeed  a  continuously  operating  microwatt  radio  capable  of  self-organizing  to  pass  a  message 
[2,3,4],  The  networkable  radio  node  deseribed  in  this  report  overcomes  this  barrier  by  leveraging 
low  duty  eycle  UWB  transmission  and  a  new,  biologieally  inspired  algorithm  using  pulse 
coupled  oscillators  to  regulate  eommunieation.  The  result  is  an  FCC  eompliant,  self-power-able 
(less  than  30uW),  radio  node  eapable  of  robust  operation  while  transmitting  information  over 
distances  via  ad-hoc  network. 

It  is  well  known  that  the  time-limited,  wide  speetrum  signaling  in  UWB  promises  greater 
network  capacity  over  traditional  radio  arehiteetures  as  well  as  low  deteetability  [1].  Low  duty 
cycle  UWB  transmissions  are  also  beneficial  for  ultra-low  power  “impulse”  radios  [1,2,3].  The 
reasoning  is  intuitively  elear:  UWB  pulses,  due  to  their  short  duration,  allow  a  synchronized 
transmitter  and  reeeiver  to  turn  off  their  radio-frequeney  (RF)  subsystems  for  long  times  between 


pulses,  as  in  Fig.  1.  By  using  this  concept  as  the  basic  premise  for  a  radio,  very  low  power 
communication,  orders  of  magnitude  below  state-of-the-art  approaches,  can  be  achieved. 

While  design  of  an  efficient  transmitter  is  straightforward,  and  has  been  the  subject  of  previous 
papers[2,3,4],  building  a  synchronized  receiver,  or  network  of  synchronized  receivers  capable  of 
the  same  0.1%  duty  cycling  without  missing  data,  is  not  straightforward.  Among  other 
characteristics,  the  receiver,  and  thus  full  transceiver,  must  be  able  to  lock  in  phase  to  others  in 
the  network  in  the  presence  of  frequency  mismatch.  Phase  locking  must  be  accurate  well  below 
the  scale  of  the  duty  cycle,  it  must  occur  in  a  few  cycles,  and  it  must  occur  even  as  nodes  in  the 
network  fail.  Furthermore,  the  locking  scheme  must  lock  the  entire  network,  rather  than 
allowing  communication  only  between  pairs  of  nodes. 

Many  forms  of  communication  rely  on  a  high  degree  of  synchrony  between  transmitter  and 
receiver  to  convey  information.  The  examples  are  numerous:  coherent  FM  receivers  utilize  phase 
locked  loops,  direct  spread  spectrum  techniques  are  based  upon  modulating  and  demodulating  a 
baseband  signal  with  a  synchronized  chip  sequence,  optical  links  feature  clock  and  data  recovery 
receive  circuitry,  and  likewise  ultra-wideband  (UWB)  radio  relies  on  receiver  and  transmitter 
synchrony.  This  last  case  of  UWB  radio  poses  some  interesting  and  unique  problems  and  is  the 
focus  of  this  work.  In  this  background  section,  we  will  introduce  UWB  radio,  illustrate  some 
synchronization  constraints  necessary  to  its  successful  function,  describe  traditional  solutions  to 
the  problem  and  their  limitations,  introduce  our  novel  proposed  solution  to  the  synchronization 
problem,  and  show  preliminary  results  from  construction  of  a  prototype  system. 

Ultra-wideband  (UWB)  radio  is  a  method  of  RF/wireless  communications  utilizing  short 
duration  pulses  instead  of  a  continuous  wave  sinusoid  to  transmit  information  (Fig.l).  It  is  well 
known  that  the  time-limited,  wide  spectrum  signaling  in  UWB  promises  greater  network  capacity 
over  traditional  radio  architectures,  allowing  superior  data-rate  and  spatial  capacity  at  similar 
power  consumption  over  short  distances.  The  short  pulse  signaling  also  allows  duty  cycling  of  the 
RF  front  end  to  save  power  (figure  1).  Due  to  the  possible  power  savings  of  this  signaling 
scheme,  we  have  chosen  to  investigate  a  version  of  this  type  of  radio  as  a  means  of  dramatically 
reducing  power  consumption  in  sensor  node  radios.  Achieving  the  benefits  of  ultra-wideband 
communications,  however,  is  contingent  on  precise  synchronization  between  transmitter  and 
receiver  such  that  transmitted  pulses  are  received.  For  instance,  if  a  transmitter  and  receiver  are 
both  run  at  low  duty  cycles,  but  not  synchronized  to  the  same  clock  and  is  a  pulse  is  transmitted, 
the  receiver  may  not  be  active  and  miss  the  data.  However,  if  the  two  are  synchronized  together, 
then  the  receiver  will  be  able  to  capture  the  pulse  even  as  the  receive  duty  cycle  is  reduced. 

A  popular  practical  implementation  of  synchronization  is  in  the  use  of  a  high  speed  DLL/PLL 
in  conjunction  with  a  digital  pulse  tracking  backend  that  maintains  synchronization  throughout  the 
period  of  communications.  An  example  of  this  architecture  is  adapted  from  Broderson  and 
O’Donnell  at  UCBerkeley.  There  are  several  drawbacks  to  this  approach.  One  drawback  of  this 
approach  is  that  the  receiver  and  transmitter  clocks  must  have  center  frequencies  matched  on  the 
order  of  ten  to  hundreds  of  parts  per  million  to  maintain  adequate  synchronization,  thereby 
necessitating  that  the  local  oscillators  of  both  the  transmitter  and  receiver  be  referenced  to  well 
matched  crystals  so  that  frequency  drift  between  them  is  minimized.  This  requirement  for  a 
crystal  imposes  a  significant  cost  to  a  system  that  a  manufacturer  would  ideally  like  to  avoid.  A 
second  drawback  is  the  lack  of  scalability  of  this  architecture.  In  this  scheme,  there  is  a  well 
defined  leader  node  with  other  nodes  following  or  locking  to  this  node.  There  is  no  feedback 
from  follower  nodes  to  the  leader.  As  a  result,  the  synchronization  scheme  will  only  work  within 
the  boundary  of  the  leader  node.  If  a  node  cannot  hear  the  leader,  it  will  not  synchronize  to  the 
network.  Furthermore,  if  the  leader  node  should  fail,  the  entire  network  will  also  fail. 

Our  transceiver  architecture  uses  the  natural  phenomenon  of  pulse  coupled  oscillators  (PCOs) 
to  replace  the  external  crystal  as  the  frequency  reference  source  in  node  to  node  communications. 
Collections  of  nodes  using  the  PCO  system  have  been  rigorously  proven  to  synchronize  in  a  self 


organizing  manner,  thereby  generating  a  global  clock  that  is  common  to  the  communicating 
nodes.  The  PCO  system  is  scalable,  and  also  has  the  characteristic  where  the  network  will  self¬ 
recover  from  any  node  joining  in  or  leaving.  With  a  global  clock  established,  node  to  node 
communications  can  then  be  established  based  on  that  global  clock. 

Establishing  a  Global  Clock 

A  robust,  low  power,  and  simple  synchronization  scheme  is  the  key  element  to  improve  upon 
the  existing  state  of  the  art  UWB  systems.  This  section  describes  a  unique  synchronization 
scheme  that  we  have  arrive  at  after  study  of  several  PLL  based  schemes.  This  unique  scheme 
enables  precise  cycling  of  all  nodes  via  a  global  signaling  clock.  This  combined  with  new  node 
architecture,  enables  a  robust  network  constructed  from  low  cost  and  low  power  modules. 

The  solution  comes  from  examination  of  a  natural  non-linear  phenomenon  observed  in  Asian 
Fireflies.  Mirrollo  and  Strogatz  [5]  have  modeled  this  system  as  a  network  of  pulse  coupled 
oscillators  that  mutually  influence  each  other  towards  a  locked  global  frequency  standard. 
Collections  of  nodes  using  the  PCO  system  have  been  rigorously  proven  to  synchronize  in  a  self 
organizing  manner,  generating  a  global  clock  that  is  common  to  the  communicating  nodes  [5]. 
The  PCO  system  is  composed  of  oscillators  following  a  state  function,  as  shown  in  Fig.  2  for  a 
two  oscillator  case  (A  and  B).  For  our  purposes,  the  output  of  an  oscillator  i  is  a  variable  Vi  that  is 
a  function  of  a  normalized  time,  (|)i  =  f/To,  where  ti  is  the  time  since  oscillator  i  last  reset  and  To  is 
the  time  a  free  running  oscillator  takes  to  complete  a  period,  can  represents  a  unit  phase  of 
sorts.  All  oscillators  start  at  random  initial  points  on  the  state  curve  and  travel  along  the  curve  at  a 
constant  and  identical  rate,  as  depicted  by  A  and  B  in  2a.  When  an  oscillator  completes  a  period,  it 
fires  and  emits  an  instantaneous  coupling  AV  to  every  other  oscillator  in  the  system  (Fig.  2b), 
causing  them  to  advance  along  the  state  curve  by  AV  and  its  associated  A^.  The  transmitting 
oscillator  then  resets  to  ti=0.  Mirrolo  and  Strogatz  show  that  if  the  state  function  is  monotonically 
increasing  and  concave  down,  then  the  system  of  oscillators  perfectly  phase-locks,  and  hence  the 
firing  times  also  synchronize.  Each  firing  drives  the  oscillators’  phases  closer  together  through  the 
nonlinearity  of  the  state  function.  A  thorough  treatment  of  the  dynamics  of  this  system  can  be 
found  in  [5].  The  PCO  network  will  self-recover  from  any  node  joining  in  or  leaving,  making  it  a 
natural  paradigm  for  an  ad-hoc  network. 


Figure  2  Monotonically  increasing  concave  down  state  function  used  to  establish  global 
synchronization.(a)  A  and  B  are  oscillators  starting  at  different  voltage  and  phase  points  on 
the  state  curve,  A  fires  before  resetting  (b)  Oscillator  A  fires,  advancing  the  phase  of  B. 

An  oscillator  circuit  realizing  the  state  function  in  figure  2  is  shown  in  figure  3(a)  and 
described  in  detail  in  [6].  Capacitor  Cl  charges  up  with  an  RC-like  characteristic,  generating  an 
output  pulse  from  the  pulse  generation  circuit  and  resetting  node  A  back  to  zero  when  the  next 
inverter  threshold  is  crossed.  The  generated  pulse  is  transmitted  to  surrounding  nodes  while 
pulses  from  other  nodes  are  received  and  coupled  into  the  PCO  circuit  through  the  “couple”  input. 
The  “couple  inhibit”  signal  prevents  self-coupling. 

A  simulation  of  the  behavior  of  3  of  these  coupled  nodes  (representing  3  radios)  is  shown  in 
figure  3(b).  Each  time  a  node  reaches  peak  voltage,  it  couples  charge  to  the  PCO  circuits  in 
neighboring  nodes,  quickly  synchronizing  all  three  nodes.  This  coupling  continues  after 
synchronization  has  occurred  to  maintain  a  time  scale  for  the  network,  preventing  frequency  or 


phase  drift.  It  is  critical  to  note  that  unlike  PLL  based  systems  which  employ  a  leader  node  to 
transmit  a  clock  signal  while  surrounding  nodes  lock  to  this  signal,  the  PCO  system  of  nodes  is 
distributed  and  will  synchronize  without  a  well  defined  leader.  All  nodes  behave  in  an  identical 
fashion,  can  be  eliminated  from  the  network,  and  are  therefore  able  to  produce  a  scalable 
synchronized  network.  The  network  is  not  limited  to  the  reach  of  a  single  leader  node.  Any  node 
that  is  within  reach  of  the  network  will  synchronize  to  it  through  non-linear  feedback.  With  a 
global  clock  established  and  a  means  of  transmitting  pulses  and  detecting  pulses  (described  in  the 
next  section)  the  basics  of  node  to  node  communications  can  proceed. 


<b) 

Figure  3:  (a)  CMOS  implementation  of  PCO  circuit  (b)  Simulation  of  3  PCO  circuits 
achieving  synchronization 

As  a  proof  of  concept  system,  we  constructed  3  simple  radio  nodes  in  a  180nm  CMOS  process 
that  realize  the  system  shown  in  figure  3.  In  addition  to  the  circuit  in  3  a,  these  nodes  also  contain 
a  basic  transmitter  and  antenna  driver,  a  simple  receiver  front  end,  and  basic  pulse  detection 
circuits.  The  full  circuit/node  constructed  is  shown  in  figures  4  and  5  below.  Figure  4  shows  the 
transmit  circuits  used,  the  pulse  shaping,  and  the  antenna  driver.  Figure  5  shows  the  topology  of 
the  pulse  detection  circuitry.  The  goal  of  this  chip  is  to  demonstrate  physical  synchronization 
based  upon  the  simple  principle  of  Mirrollo  and  Strogatz  in  a  realistic  RF  radio  environment.  The 
results  of  this  study,  shown  briefly  in  this  report,  are  published  in  [1 1]  and  discussed  in  [12]. 


Figure  4-  Transmission  circuits. 


Figure  5  Pulse  detection  circuit,  follows  basic  receiver  circuit  not  shown  here.  Output  of 
pulse  detection  drives  the  couple  input  of  the  PCO  circuit  shown  above. 

We  fabricated  our  PCO  based  transceiver  chip  in  the  IBM  CMOS7RF  process.  We  demonstrate 
synchronization  between  multiple  nodes  for  a  UWB  impulse  radio  network  using  3  of  these  chips. 
We  select  3  nodes  for  the  demonstration  to  show  the  scalability  of  our  scheme  as  compared  to 
other  work,  where  the  synchronization  scheme  is  limited  to  a  point-point  communication.  We 
demonstrate  synchronization  between  these  nodes  through  a  wireless  medium  using  wideband 
monopole  antennas.  In  our  test  setup  we  configure  the  three  transceiver  nodes  identically  and 
measure  the  relative  timing  of  the  PCO-clock  edges  to  assess  the  quality  of  synchronization 
between  the  three  different  nodes.  In  Fig.  6a,  we  show  the  relative  timing  between  successive 
edges  when  the  nodes  are  free  running,  while  in  Fig.  6b  we  show  the  relative  timing  between 
successive  edges  when  the  nodes  are  synchronized.  Due  to  process  variation  the  free  running 
frequencies  of  the  three  nodes  are  not  identical  (Fig.  6a),  with  the  median  period  for  the  three  of 
them  being  136,  137.2,  138  ns  respectively.  However,  after  turning  on  the  coupling  between  the 


nodes,  the  internal  PCOs  runs  at  the  same  rate,  with  a  median  period  of  135  ns  for  all  three  nodes. 
Fig.  6b  also  shows  the  low  jitter  nature  of  the  PCO  system  after  lock,  since  the  cycle-to-cycle 
period  only  varies  within  ±1  ns,  making  the  PCO  system  a  reasonable  clock  source  when 
synchronized. 


We  show  further  proof  of  node- synchronization  by  looking  at  the  eye  of  the  internal  PCO 
oscillator  of  node-2  and  node-3  when  triggered  with  respect  to  node-1  (Fig.  7).  When  the  nodes 
are  not  synchronized  the  eye  for  node-2  and  node-3  is  completely  closed  (Fig.  7a),  while  in  the 
synchronized  case  it  is  open  (Fig.  7b).  When  synchronized,  communication  between  nodes  can 
be  precisely  timed,  enabling  the  receiver  to  be  shut  off  when  a  signal  is  not  expected.  The  phase- 
offset  between  different  nodes  at  the  time  of  synchronization  is  dictated  by  receiver  and 
transmitter  delay  variation  as  well  as  by  their  relative  distance  from  the  centroid  of  the  node¬ 
network,  and  hence  are  not  expected  to  be  exactly  aligned  with  each  other. 

We  also  look  at  the  relative  phase  statistics  of  the  two  oscillator  nodes  with  respect  to  the 
positive  edge  transition  of  the  first  node,  to  see  how  well  they  maintain  relative  phase  with 
respect  to  each  other.  The  cycle-cycle  uncertainty  characteristics  can  be  quantified  by  looking  at 
the  phase  histograms  (Fig.  8).  The  cycle-cycle  jitter  when  the  nodes  are  synchronized  is  very  low 
with  a  standard  deviation  of  the  zero  crossing  being  close  to  0.01UI(1%  of  the  Period)  (Fig.  8). 
This  means  this  scheme,  can  be  utilized  for  an  aggressively  duty-cycled  IRUWB  system.  As  the 
cycle-cycle  uncertainties  are  of  the  order  of  1%,  the  system  can  be  duty-cycled  by  that  much, 
resulting  in  significant  power  saving  with  a  low  Bit  Error  Rate. 
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Fig.  9  shows  a  general  description  of  a  transceiver  node  utilizing  this  PCO  circuit.  Each  node  has 
an  interface  with  the  physical  channel  in  both  transmit  and  receive  and  a  method  to  extract  the 
synchronization  pulse  from  the  received  information.  Each  node  implements  the  PCO  as  well, 
whereby  the  global  clock  in  the  system  is  created.  This  global  clock  is  used  by  the  network  to  time 
all  communications. 


Figure  9  General  Transceiver  Node 

A.  Basic  Radio  and  Communication  Architecture 

The  basic  radio  architecture  described  in  figure  9  has  several  key  features.  While  the 
communication  scheme  used  in  the  radio  is  flexible,  we  have  developed  it  based  on  a  version  of 
time  domain  multiplexing  shown  in  figure  10  where  the  presence  or  absence  of  spikes  in  a  given 
window  encodes  data.  As  a  result,  the  radio  must  perform  several  tasks  to  enable  communication. 
The  radio  must  receive  and  detect  data  and  synchronization  impulses  from  the  channel.  This  is 
done  using  a  dual  band  receiver,  described  in  section  V,  searching  for  energy  in  either  a  “data”  or 
a  “sync”  band.  Impulses  received  are  then  thresholded  by  a  peak  detector  circuit  described  in  [6] 
and  passed  either  to  the  PCO  coupling  input  or  to  the  data  path. 
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Figure  10  Time  domain  multiplexing  method  in  UWB. 

In  order  to  recover  the  meaning  of  the  data  input,  the  impulses  must  be  binned.  Binning  is 
established  via  a  divided  down  version  of  the  PCO  clock.  The  period  between  synchronized 
impulses  from  the  PCO  represents  a  “frame”.  Each  “frame”  is  divided  into  “bins”  in  which  the 
presence  or  absence  of  an  impulse  represents  an  encoding  defined  by  the  radio.  By  positioning 
pulses  in  bins  within  a  single  frame  or  over  several  frames  (a  packet)  data  can  be  uniquely 
encoded. 

Enabling  this  communication  scheme  requires  some  additional  circuitry  in  the  radio  node.  The 
most  notable  addition  is  a  very  low  frequency,  very  low  jitter  PEL  based  frequency  synthesizer 
capable  of  producing  a  number  of  bins  within  each  frame.  In  our  design,  we  use  a  divide  by  128 
frequency  divider  in  the  feedback  loop  of  a  relaxation  oscillator  based  PEL  to  achieve  “binning” 
with  a  steady  state  power  consumption  of  2.5  uW.  By  using  a  relaxation  oscillator  VCO,  we 


virtually  eliminate  the  deterministic  jitter  in  the  PLL  due  to  oscillator  non-linearity.  ( This  new 
PLL  design  is  quite  an  interesting  discovery  in  itself,  but  in  order  to  limit  the  length  of  this  report  I 
will  not  discuss  this  discovery  in  detail.) 

Using  this  circuit,  the  existence  and  location  of  impulses  can  be  regulated  and  verified  by  simple 
logic  or  a  back  end  processor.  It  is  critical  to  note  that  under  this  synchronization  and 
communication  scheme,  the  radio  receiver  will  always  know  a  priori  when  it  may  expect  to  see  an 
impulse.  Although  an  impulse  may  not  always  appear  in  this  window,  if  the  timing  is  correct,  the 
impulse  will  never  appear  outside  of  the  expected  window,  so  data  is  never  missed,  as  in  figure  1. 

B.  Duty  Cycling  the  Front  End 

This  section  defines  the  mechanism  to  power  cycle  the  receiver  front  end.  This  operation  is 
performed  through  a  simple  series  of  low  power  DLL  operations.  Once  synchronized,  the 
receiver  and  transmitter  nodes  are  time  matched  to  within  a  bin  and  turn  off  the  RF  subsystems  for 
all  but  2  bins  (the  known  data  bin  and  the  synchronization  bin).  This  lowers  the  duty  cycle  and  the 
RF  power  consumption  to  2/Nbins.  While  this  allows  for  a  few  percent  duty  cycle,  it  is  not 
aggressive  enough  to  achieve  microwatt  power  levels. 

Secondary  acquisition  occurs  to  further  reduce  the  duty  cycle.  Delay  locked  loops  (each 
consuming  0. 1  luW  in  a  90nm  process)  trigger  on  the  bin  rising  edge,  where  a  pulse  is  expected. 
The  delay  locked  loops  also  lock  to  the  arrival  time  of  the  pulse,  to  close  the  window  and  turn  on 
the  RF  amplifier  shortly  before  the  pulse  is  expected  to  arrive.  The  same  process  occurs  in  the 
sync  bin.  This  generates  a  very  tight  window  of  time  when  the  RF  system  is  on  around  the 
anticipated  arrival  time  of  the  pulse,  as  in  Fig.l. 

Note  that  the  logic  block  in  figure  9  expects  a  control  signal  from  an  outside  controller.  This 
block  is  an  essential  piece  of  our  UWB  system  and  must  also  be  designed  for  ultra-low  power 
consumption.  The  external  controller  is  responsible  for  interfacing  and  processing  information  for 
any  sensor,  maintaining  the  state  of  the  system,  implementing  a  suitable  encoding  scheme  for 
transmission,  and  recording  the  detected  data  pulse.  Since  all  pulse  detection  and  processing 
functions  are  implemented  on  chip,  the  controller  only  needs  to  run  at  the  pulse  rate,  the  slowest 
timescale  in  the  system. 

Transmitting  and  Receiving 

As  noted  in  previous  sections,  a  power  efficient  and  reliable  FCC  compliant  channel  interface  is  a 
critical  aspect  of  our  design.  To  eliminate  the  impact  of  interference  between  data  and 
synchronization  information,  we  utilize  a  dual  band  signaling  scheme.  For  compliance  with  the 
FCC  mask,  we  employ  a  wide  band  transmitter  with  a  tunable  bandwidth  from  500MHz  to  IGHz 
(l-2ns  pulse  duration)  and  a  center  frequency  tunable  from  3.5  to  4.5  GHz.  Simulation  of  our 
transmitter  output  spectrum  was  performed  by  connecting  the  transmitter  output  to  a  50ohm  load 
through  a  IpF  capacitor  (to  model  a  50ohm  antenna  impedance)  and  taking  the  FFT  of  the 
Cadence  output  over  several  cycles.  The  results  are  shown  in  figure  11  for  a  150Kpulse/sec  rate. 


Tranamitter  Spectmmi 


Figure  11  Transmission  power  spectrum.  FCC  mask  is  indicated  by  the  dashed  line. 


Based  upon  simulation,  the  transmission  power  is  compliant  with  the  FCC  UWB  standards  and 
within  the  FCC  mask  (dashed  line).  There  is  also  virtually  no  power  overlap  in  synchronization 
and  data  transmissions.  The  actual  transmitter  circuit  is  designed  as  a  class  C  amplifier  that 
ANDs  a  fast  startup  duty  cycled  LC-VCO  output  with  a  trigger  pulse  to  produce  a  wavelet  output 
of  tunable  frequency.  The  simulated  power  consumption  of  the  combined  circuit  is  3.3uW  at 
150Kpulse/sec  data  rates,  with  2uW  of  power  delivery. 

Likewise,  the  receiver  circuit  is  a  composed  of  a  five-stage  tunable  wide  band  amplifier,  a  single 
stage  of  which  is  shown  in  figure  12.  Each  stage  uses  a  tunable  LC  load  on  a  differential  CS 
amplifier,  optimized  for  switching  transients  of  less  than  2ns.  Simulation  indicates  a  gain  of  39dB 
for  the  combined  receiver  with  a  steady  state  power  consumption  of  6.5mW  in  a  90nm  IBM 
process  and  center  frequency  tuning  from  3.5-4.5GHz.  Short  transient  decay  times  enable  the 
receiver  to  be  duty  cycled  to  0.1%,  enabling  average  power  consumption  of  approximately  1  luW, 
including  leakage. 


Figure  12  Single  stage  of  5-stage  dual  band  receiver  circuit. 


Figure  13  Operation  of  the  dual  band  transceiver  front  end.  (a)  sequential  transmission 
of  data  and  sync  pulses  in  different  RF  bands  (b)  peak  detected  output  when  the  receiver  is 
tuned  to  data  (c)  peak  detected  output  when  the  receiver  is  tuned  to  sync. 

Figure  13  shows  a  Cadence  circuit  simulation  of  the  dual  band  transceiver  operation  described 
above  in  a  90nm  IBM  process.  In  13(a)  the  transmitter  sends  two  sequential  impulses  into  the 
channel  the  first  one  in  the  data  band  and  then  one  in  the  sync  band.  Each  has  different  frequency 
content.  The  receiver  sees  only  the  transmissions  in  the  band  to  which  it  is  tuned,  in  13(b)  the 
data  band  and  the  sync  band  in  13(c).  The  output  of  the  receiver  is  then  thresholded  by  a  peak 
detector  which  outputs  a  pulse  (shown  in  each  case)  which  is  passed  either  to  the  PCO  circuit  or 
through  the  data  path,  completing  communication. 

System  Results  and  Conclusions 


The  system  described  above  implements  an  ultra-low  power  impulse  radio  based  upon  the 
principle  that  low  duty  cycle  communication,  enabled  by  a  robust  network  clocking  scheme,  will 
provide  dramatic  power  savings  on  the  front  end.  We  designed  and  simulated  the  layout  for  the 
radio  described  in  figure  9  (based  on  IBM  90nm).  From  this,  we  calculate  the  expected  steady 
state  power  consumption  for  our  radio  front  end  to  be  26.4uW,  with  1  luW  required  for  the 
receiver  and  15.4uW  required  for  everything  else.  The  latter  includes  the  continually  operating 
digital  portion  of  the  radio,  all  timing  control  circuits,  DLL’s,  PLL’s,  logic,  peak  detectors,  and 
the  duty  cycled  transmitter.  This  power  may  scale  somewhat  in  more  aggressive  technology 
nodes,  but  will  eventually  be  limited  on  the  low  end  by  leakage. 

The  receiver  front  end  consumes  approximately  1  luW  steady  state  based  upon  the  assumption 
that  the  receiver  will  be  “on”  for  10ns  of  the  7uS  frame  (although  each  pulse  is  l-2ns  long)  to 
account  for  channel  delay,  receiver  settling  time,  and  use  of  2  pulses  per  cycle.  During  the  “off’ 
cycle,  power  is  contributed  by  leakage  current  only.  Lower  power  can  be  achieved  at  data  pulse 
rates  below  150kHz.  However,  scaling  in  feature  size  will  not  offer  much  benefit  for  this  circuit, 
as  gain  is  a  critical  metric  for  performance. 

To  the  best  of  our  knowledge,  these  operating  power  levels  are  far  below  any  other  published  to 
date,  and  would  enable  continuous  battery  operation  for  years  or  use  with  scavenging  power 
supplies  where  high  data  rates  are  not  required.  This  capability  would  open  up  a  wide  variety  of 
new  applications  in  sensing,  monitoring,  and  situational  awareness  throughout  medical  and 
military  fields. 
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Topic  2  :  PNUP:  A  Technique  for  Low-Power  and  Low-Phase  Noise  Phase-Locked  Loops 
Design 

We  present  our  work  on  Low  Power,  Low  Noise  CW  radio  design  in  this  section.  We  have 
developed  a  technique  for  design  of  low-power  and  low-phase  noise  frequency  synthesizers. 
This  technique  introduces  a  key  parameter,  PNUP  (Phase  Noise  per  Unit  Power),  to  all  the 
building  blocks  of  a  PLL  that  correlate  all  the  blocks  in  terms  of  power  and  phase  noise.  It 
eases  the  complicated  PLL  design  when  both  low-power  and  low-phase  noise  are  stringent 
requirements.  This  makes  it  possible  to  do  system  level  power  and  phase  noise  optimization. 
By  correlating  all  the  independent  PLL  blocks  together,  complicated  PLL  design  and 
optimization  can  be  significantly  simplified.  Effectively  using  this  method  reduces  the 
effort  in  design  process  and  results  in  lower  power  for  given  phase  noise  specification.  We 
demonstrate  a  6.5  GHz  Frequency  synthesizer  design  using  this  technique  in  0.25um 
process  achieving  -108dBc/Hz  at  lOOKHz  offset  phase  noise  with  only  32.75  mw  power 
consumption. 


Background: 

There  are  a  growing  number  of  applications  within  wireless  and  wired  communications  systems 
that  require  both  low-power  and  low-phase  noise  frequency  synthesizers.  And  frequency 
synthesizers  are  one  of  the  most  power  hungry  components  in  such  communication  systems. 
Designing  low-power  low-phase  noise  frequency  synthesizer  has  been  a  great  challenge  for 
engineers  not  only  because  the  loop  has  several  building  blocks  shown  in  Fig.l  but  also  each 
building  block  evolving  many  tradeoffs  in  many  dimensions,  shown  in  Fig.2.  Such  as  power, 
phase  noise,  tuning  range,  linearity,  VCO  gain  and  area  in  VCO  design.  And  power,  phase  noise 
and  speed  in  frequency  divider  designs.  In  many  building  blocks  power  trade  off  directly  phase 
noise.  Importantly,  low-power  and  low-phase  noise  frequency  synthesizer  design  is  more  than 
the  sum  of  all  the  independent  excellent  block  designs.  Since  all  the  low-phase  noise  comes  at 
the  expense  of  power,  over-design  on  some  blocks  is  a  waste  in  low-power  applications. 
Therefore,  designers  require  a  parameter  that  can  correlate  all  the  building  blocks  together  for  an 
optimal  design  in  a  higher  level  for  low-power  and  low-phase  noise.  Fortunately,  previous  theory 
studies  on  the  noise  property  of  PLL  offer  great  insight  on  understanding  the  noise  contribution 
effect  from  each  building  block  to  the  overall  phase  noise  of  the  loop.  However,  without  a  proper 
tool  or  method,  applying  such  theory  to  actual  design  can  be  complicated  and  cumbersome  to  use. 
The  key  link  is  made  possible  by  introduced  design  parameter,  PNUP,  which  applies  the  noise 
theory  to  actual  design  process  that  can  simplify  the  design  and  optimization.  Therefore,  system 
level  power  saving  is  achieved  by  intelligent  power  allocation  within  the  system  among  all 
building  blocks. 


Detailed  Description  of  the  Approach: 


Pnup:  A  New  Parameter  to  Aid  Low-power  Low-phase  Noise  PLL  Design 
We  introduce  the  design  parameter,  PNUP  (Phase  Noise  per  Unit  Power).  The  PNUP  measures 
the  phase  noise  improvement  for  each  component  per  unit  power  cost.  This  enables  a  direct 
comparison  of  the  impact  of  power  allocated  to  each  component  of  the  loop.  It  is  mathematically 
defined  as: 
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(1)  Where  (pn,totaAS  the  total  output  phase  noise.  Pi 
is  the  power  of  component  i  and  Hi(s)  is  the  transfer  function  of  component  i.  The  metric,  PNUP, 


is  a  function  of  offset  frequency,  component  power  consumption,  and  loop  transfer  funetion.  We 
derive  the  PNUP  for  VCO  and  Frequeney  dividers  respeetively  as  following: 
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Where  P  is  the  power  of  the  circuit,  P  is  the  scaling  factor  from  signal  power  to  circuit  power  and 
F  is  an  experimental  fitting  factor. 
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Where  the  first  term  is  white  noise  and  the  seeond  term  is  up-converted  flicker  noise,  coout  is  the 
output  frequency  of  the  frequency  divider,  P  is  the  power  of  the  eireuit  and  Ka  and  Kb  are 
experimental  fit  factors. 

The  advantage  of  this  design  parameter,  PNUP,  ean  be  summarized  as  the  following: 

1 .  PNUP  correlates  all  building  bloeks  of  PLL  in  terms  of  power  and  phase  noise. 

2.  PNUP  method  enables  system  level  optimization  on  power  and  phase  noise. 

3.  PNUP  simplify  the  design  and  optimization  proeess  because  all  independent  building 
blocks  use  the  same  parameter,  PNUP. 


Results 

Based  on  our  original  application  within  atomic  clock  like  GPS  systems,  given  the  requirement 
of  phase  noise  below  -103dBc/Hz  at  lOOKHz  offset  with  less  than  35mW  of  power,  this  method 
prediets  that  less  power  on  the  VCO  for  a  moderate  phase  noise  performance  will  ensure  us  to 
achieve  the  preset  required  phase  noise  performance  and  save  on  power. 

The  measurement  results  for  the  loop  phase  noise  in  Fig.3  show  a  good  mateh  with  theory  in 
Fig.4.  Fig.  4  shows  the  erossover  happens  at  lOOKHz  whieh  is  when  PNUPvco  equals  to  PNUPdiv 
This  indicates  that  optimal  power  alloeation  is  achieved  at  offset  frequency  of  lOOKHz.  In  other 
words,  this  is  the  best  phase  noise  we  ean  get  with  such  mount  of  power  consumption. 

When  we  examine  the  loop  components,  we  measured  that  the  power  for  the  VCO  and  frequeney 
dividers  are  9.25  mW  and  1  ImW  respectively  from  2.5  V  supply.  The  VCO  phase  noise  was 
measured  to  be  -90  dBc/Hz  at  lOOKHz  offset.  A  comparison  of  VCO  performance  is  listed  in 
Table  1.  Our  VCO  performs  relatively  well  even  in  a  larger  feature  size  proeess  when  compared 
to  the  other  SOI  design.  But  when  eomparing  to  other  designs  in  bulk  silieon,  our  VCO  suffers 
from  higher  phase  noise  due  to  the  higher  flicker  noise,  a  disadvantage  in  SOI  processes. 
Inereasing  the  VCO  power  to  the  levels  of  the  other  designs  would  improve  the  phase  noise  of 
this  bloek,  but  PNUP  technique  dietates  that  we  ean  sacrifice  some  phase  noise  performance  here 
for  low-power  optimization  of  the  system.  Therefore,  our  VCO  achieved  the  lowest  power 
consumption  among  comparable  designs  with  moderate  phase  noise  performance.  However, 
when  the  complete  loop  performance  is  measured,  data  shown  in  Table  2  and  Fig.5a,  chip  picture 
shown  in  Fig5b,  to  the  best  of  our  knowledge  the  system  overall  power  consumption  is  among 
the  lowest  and  the  phase  noise  is  the  best  eompared  with  designs  in  similar  operating  frequeney 
range  with  same  proeess  (0.25um)  or  even  better  proeess  (0.1  Sum  and  90nm). 


Phase  Noise  of  VCO  &  PLL 


Fig.  3  Phase  noise  measurement  of  VCO  and  PLL 


Fig.  4  PNUP  of  VCO  and  frequeney  divider 


Table  1  VCO  performance 
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Table  2  Frequency  synthesizer  performance 


Referen 

ce 

Fosc 

(GHz) 

Power 

(mW) 

Phase 

Noise 

@100K 

Process 

[12] 

5 
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35 
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CMOS 

[15] 
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150 
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0.09  SOI 

US 

6.5 

32.75 

-108 

0.25 

SOI 

Fig.  5  (a)  Frequency  synthesizer  performance  comparison  (b)  Chip  picture 


Based  upon  investigation  of  phase  noise  characteristics  of  frequency  synthesizers,  we  present  an 
intelligent  power  allocation  technique  for  systems  requiring  stringent  power  and  phase  noise 
performance.  We  demonstrate  a  frequency  synthesizer  at  6.5  GHz  that  achieves  both  low-phase 


noise  and  low-power  merely  by  reallocating  loop  power.  This  technique  can  be  broadly  used  in 
PLL  designs  especially  with  a  phase  noise  contour  and  a  power  budget  to  satisfy. 


Topic  3:  Low  power  Oscillators  for  CW  transceivers 

Beating  the  power  limits  of  LC  oscillators 

Through  this  project  we  have  invented  a  new  voltage-controlled  oscillator  topology  that  pushes 
the  lowest  power  consumption  boundary  of  the  widely  used  negative-Gm  LC  oscillators  even 
lower.  This  novel  negative-Gm  boost  topology  achieves  low  power  operation  comparable  to  ring 
oscillators  and  maintains  the  low  phase  noise  property  of  LC  oscillators.  This  concept  is  verified 
with  two  fully  integrated  IGHz  VCO  designs  on  a  O.lSum  standard  CMOS  process.  This 
proposed  negative-Gm  boost  topology  is  able  to  start  the  oscillation  with  35%  less  power  as 
compared  with  the  lowest  power  to  start  a  conventional  negative-Gm  LC  oscillator. 

Background: 

Wireless  transceivers  for  sensor  networks  and  handheld  devices  present  unique  design 
challenges;  these  transceivers  must  be  highlyintegrated,  low  phase  noise  and  most  importantly, 
low  power  [1].  Voltage  controlled  oscillators  (VCOs)  are  one  of  the  most  important  building 
blocks  in  the  system.  Ring  oscillators  and  LC  oscillators  are  the  two  most  widely  used  VCO 
topologies  for  these  systems.  Ring  oscillators  are  capable  of  low  power  operation  but  the  phase 
noise  is  disappointing  [2].  LC  oscillators  offer  excellent  phase  noise  performance  but  require 
more  power  to  start  up  an  oscillation.  Furthermore,  in  low  power  systems,  various  power 
management  techniques  require  the  clock  to  be  suspended  and  re-activated  on  the  fly.  As  a 
result,  it  is  not  only  start-up  power,  but  also  start-up  time  that  determines  oscillator  performance. 
A  fast  start-up  oscillator  reduces  the  overall  power  consumption  of  the  system. 

Gm-boosted  LC  Oscillator  Design'. 

The  power  required  to  start  an  LC  oscillator  is  determined  by  the  power  required  to  overcome  the 
loss  of  the  LC  tank.  That  is  the  power  required  to  generate  the  negative  resistance  to  compensate 
the  parasitic  resistance  of  the  LC  tank.  For  most  standard  LCO  topologies  this  compensating 
resistance  is  -2/gm.  A  novel  VCO  topology  is  proposed,  shown  in  Fig.  1.  By  inserting  amplifiers 
with  gain  of  “A”  in  the  loop,  the  effective  negative  resistance  is  boosted  by  “A”  times  to 
-2/(gm*A).  Now,  the  minimum  necessary  gm  from  the  transistor  can  be  “A”  times  smaller  to 
start  oscillation  using  the  same  tank.  To  compensate  the  tank  loss,  it  requires  much  less  current 
through  the  transistor  pair,  thereby  saving  power.  Note  that  the  amplifier  power  consumption  has 
to  be  taken  into  account,  and  that  the  power  of  this  amplifier  must  be  maintained  low.  A 
differential  version  of  this  topology  is  shown  in  Fig.  2. 


Fig.  1  Gm-boosted  LC  oscillator  topology 


Fig.  2  Implementation  Gm-boosted  topology  with  a  differential  amplifier 


Limitations  and  Advantages  of  the  Gm-boosted  Topology: 

The  primary  limitation  of  this  topology  lies  in  the  design  of  the  internal  amplifier.  The  internal 
amplifier  requires  the  bandwidth  to  be  larger  than  the  oscillation  frequency  with  gain  of  more 
than  1.  The  choice  of  RD  larger  than  RP  ensures  that  the  negative  resistance  in  Gm-boosted  LC 
oscillator  is  generated  more  power  efficiently  than  regular  cross-coupled  LC  oscillator  would. 

On  the  other  hand,  the  large  RD  value  increases  the  difficulty  of  the  high  speed  amplifier  design. 
The  upper  limit  can  be  easily  recognized  as  the  frequency  where  the  amplifier  offers  at  least  gain 
ofA>l. 


Power 


Minimum  start-up  power 
for  cross-coupled 


Minimum  start-up  power 

for  Gm-boosted  The  region  where  cross- 
coupled  oscillator  can  not 
start  but  Gm-boosted 
oscillator  can  still  perform! 

Fig.  3  Continuously  expand  the  power  and  phase  noise  trade  off  toward  low  power  extreme 


Another  characteristic  of  this  amplifier  is  that  the  internal  amplifier  introduces  phase  shift  to  the 
oscillator,  forcing  the  oscillator  to  oscillate  off  the  resonant  peak.  Since  on-chip  inductors  have 
low  Q,  the  offset  has  minor  effect  on  the  oscillator  for  small  phase  shift.  However,  output  power 
can  be  slightly  reduced  from  the  peak  resonance.  Furthermore,  the  internal  amplifier  does 
introduce  noise.  However,  the  LC  tank  filters  out  most  of  this  noise.  Therefore,  the  additive  noise 
is  small  enough  to  ignore  compare  with  the  noise  from  the  lossy  tank.  In  this  topology  power  still 
trades  off  with  phase  noise  the  same  manner  as  regular  cross-coupled  LC  oscillators.  That  is  as 
the  current  going  to  the  tank  becomes  smaller  the  oscillation  amplitude  is  smaller  resulting  in 
lower  signal  to  noise  ratio  and  higher  phase  noise  [3]. 

The  contribution  of  this  work  is  to  extend  this  tradeoff  beyond  the  low  power  boundary  of 
regular  cross-coupled  LC  oscillators  shown  in  Fig.  3.  The  regular  cross-coupled  LC  oscillator 
would  stop  performing  at  power  level  PI .  But  the  Gm-boosted  LC  oscillator  can  still  perform 


until  a  lower  power  level,  P2.  Both  LC  oscillators  have  much  less  phase  noise  than  ring 
oscillators  at  the  same  frequency  and  power  level. 

Results: 
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Fig.  4  Startup  Time  Measurement  Histogram 
Table  1  Oscillator  Results  Comparison 


A  side  by  side  comparison  of  IGHz  VCO  designs  between  the  two  topologies  driving  the  same 
LC  tank  has  been  designed,  fabricated  and  measured  in  0.13pm  standard  CMOS  process.  The  LC 
tank  is  formed  with  an  on-chip  inductor  {Q=  4)  and  a  varactor  with  a  20%  frequency  tuning 
range  [4].  With  the  same  LC  tank,  the  conventional  negative  Gm  LC  oscillator  needs  824pW 
from  L3V  supply  to  start  oscillation.  However,  the  boosted  topology  needs  only  533pW  (299pW 
for  cross-coupled  pair  and  234pW  for  amplifiers).  When  operating  at  the  same  power  level,  the 
cross-coupled  oscillator  takes  an  average  of  215ns  to  start  oscillation  but  Gm-boosted  oscillator 
takes  only  an  average  of  160us  to  start  oscillation.  The  start-up  time  is  sampled  by  statistical 
measurement  and  the  histogram  is  shown  in  Fig.  4.  The  phase  noise  of  the  conventional  topology 
is  -105dBc/Hz  at  100  KHz  offset.  The  phase  noise  of  the  boosted  topology  is  -100.5dBc/Hz  at 
100  KHz  offset.  A  4-stage  tunable  ring  oscillator  using  differential  amplifiers  consumes 
approximately  293  pW  and  has  phase  noise  of  -90dBc/Hz  at  1  MHz  offset.  In  comparison  to  the 
low  power  alternative  of  a  4  stage  ring  oscillator  at  a  similar  power  level,  this  topology  shows  a 
significant  advantage  in  phase  noise.  Table  1  list  the  result  for  comparison  including  a  ring 
oscillator  design  in  this  0.13  pm  standard  CMOS  process.  All  three  types  of  oscillators  follow  the 
phase  noise  vs.  power  tradeoff  curve  as  more  power  resulting  better  phase  noise.  The  ring 
oscillator  can  oscillate  at  lower  power  level  with  significantly  worse  phase  noise.  The  regular 
cross-coupled  oscillator  stopped  working  at  its  minimum  start-up  power  level  when  we  reduce 
the  power  consumption.  However,  the  Gm-boosted  oscillator  fills  up  the  gap  and  can  still 
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perform  at  reduced  power  level  (lower  than  the  minimal  start-up  power  of  the  regular  cross- 
coupled  oscillator).  Further  results  of  this  work  can  be  found  in  [5,6] 
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